Feature/agentic#1198
Conversation
Overwrites AvailableModels with custom agentic configuration for testing. Many dev-friendly choices are in here that won't make it to prod. Everything is still stored in elasticsearch. No postgres required for now. Subsessions instead of an AI Back Channel. I found that marking subsessions with their parent's session ID was easy and had immediate benefits. If we want an AI Back Channel, we can search for all sessions of type "delegation". The DelegationKickoff. This "ToolResult" is the magic that makes SOC's Agentic AI work. This result handed back by the delegate tool leaves the ToolRequest in an open state while the delegation happens in another session. The client, when requesting to run the delegation tool will receive new SSEs for delegation_started and delegation_resolved because while they're approving a tool in session A, they are going to receive messages from the sub-agent in session B in the same stream of SSEs. Refactor the Tool interface to pass in the actual ToolRequest object instead of several parameters from it. All unit tests pass. TODO: Put in an "Agentic" switch that turns this new behavior on or off. Limit sub-agent sessions by specifying max tokens. Specify a depth limit for delegation. Indicate sub-session context/token usage. Looping logic for automation that auto-approves tools. Skills. Memory. Move hardcoded bits to config.
Single agentic switch to toggle the AssistantCoordinator from the old behavior to the new behavior. Migrate SSE helper functions and their tests from gemini to sse.
Using `id@adapter` worked with simple LLM chat but now that we're moving to agentic it can be limiting. Instead, this PR refactors places we refer to models to use the displayName instead. Given our models are defined by the users, we were always going to rely on user generated identifiers and this should result in less naming collisions. Now we can refer to the same model through the same adapter but give it different prompts, tools, and personas and still reference them individually. Because of historical sessions, when reading from the database we need to support the old ID. New interactions will use the new ID going forward.
Deduplicate code in the frontend and the backend. captureToolResult and prepareChatRequest. Check DisplayName isn't empty on module startup and disable any models missing it. Simplify checks elsewhere. Optimized calls to GetSession so executing a tool doesn't call it 3+ times and include unnecessary metadata. Explicitly limit Gemini to only request 1 candidate and only process the first candidate. "Candidates" are how gemini allows for multiple responses, expecting the user to select one to continue with and drop the others. Also cleaned up non-streaming tool use for when we support multiple tool requests. Fixed an SSE blockage that can happen when certain errors happen on the first chunk received from an LLM.
If a user refreshes their page or closes a browser tab during an AI request, they terminate the connection resulting in the AI interaction to be incomplete in a variety of different ways depending on exactly when the client disconnected. These changes improve how the Go responds to these events by allowing the AI to finish in addition to front end changes that better adapt to in-progress states.
Agents are now defined on their own, for now hardcoded. Models are defined in AvailableModels in the config. AgentMappings link an Agent to a Model. We now send back the agentic toggle, availableAgents, and agentMapping to the UI. When in agentic mode, Agents will be presented to choose from instead of Models and Agent names are used over the API instead of Model names. A conversation may be started with any agent.
The frontend received an update to support multiple tool requests in the same message. The UI plays things very safe by only approving each tool after the previous ran, this was not enforced by the backend. Should the UI (or an API client) attempt to execute all tools at once, we were headed for a concurrent write problem. Now, serializing tool requests is enforced by the backend on a per session basis. Meaning for any one session, one tool can execute at a time. Should another request come in and try and execute a tool while one is already running, it'll immediately receive a 409 (conflict) and should try again later. This retry has been added to the frontend so it tries once a second for 30 seconds. Removed attempts at limiting how many tools the LLM can call. Cleaning up before PR.
| // A tool POST gets a 409 when another tool turn is already running for the session | ||
| // (the backend fails fast rather than blocking). Retry a bounded number of times | ||
| // with a short delay before surfacing an error. | ||
| toolBusyMaxRetries: 30, |
There was a problem hiding this comment.
These hardcoded values need to be modifiable via config settings since user AI servers will have very different loads. Ok to default though.
There was a problem hiding this comment.
toolBusyMaxRetries and toolBusyRetryDelayMs are now loaded from the info handler and client parameters.
There was a problem hiding this comment.
This file has become too large. Plan to start splitting this up into smaller, organized scripts before additional work is done to it.
There was a problem hiding this comment.
It has been split
| // every message after it is a tool_result (answering this turn's tools), so the model | ||
| // has not continued and the user has not moved on. Once a non-tool_result message | ||
| // follows, any unanswered tool from the turn was abandoned, not pending. | ||
| isActiveToolTurn(backendMessages, i) { |
There was a problem hiding this comment.
A lot of these new small, more discrete functions don't appear to have unit test coverage. They are great candidates for it though, and help guard against bugs introduced by future maintainers.
| // Populate a delegate tool's childSession from a stored sub-session's history: | ||
| // the sub-agent's prose becomes a collapsed thought, its tool calls become nested | ||
| // tool cards (with raw results), recursing into deeper delegations. | ||
| reconstructChildSession(delegateToolUse, sub, index, ownerSessionId) { |
There was a problem hiding this comment.
Longer comprehensive logic blocks like this function probably should be broken apart to make it easier to read and easier to unit test.
| </div> | ||
| </v-card> | ||
| </div> | ||
| <tool-use-card v-for="toolUse in message.toolUses" :key="toolUse.id" :tool-use="toolUse"></tool-use-card> |
There was a problem hiding this comment.
It's good to see more use of componentization like this.
| // @Failure 403 "Insufficient permissions for this request" | ||
| // @Failure 500 "Internal SOC error; review SOC logs" | ||
| // @Router /connect/assistant/chat [post] | ||
| func (h *AssistantHandler) PostChat(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
This is another example of a function that's grown too large. HTTP handlers shouldn't be large or complex. They should parse inputs and respond with appropriate HTTP codes/errors but otherwise delegate to a business-coordination function, which itself should be call into more discrete functions.
| RUN npm install jest jest-environment-jsdom --global | ||
|
|
||
| RUN if [ -f "src2/prompt_system.md" ]; then echo "compressing system prompt"; gzip -c src2/prompt_system.md > server/modules/assistant/SOSystemPrompt.bin; fi | ||
| RUN if [ -f "src2/prompt_agent_orchestrator.md" ]; then echo "compressing agent orchestrator prompt"; gzip -c src2/prompt_agent_orchestrator.md > server/modules/assistant/SOAgentOrchestratorPrompt.bin; fi |
There was a problem hiding this comment.
Would be cleaner to loop this or perhaps even combine prompts into a single file that Go could split apart dynamically. That would make it simpler to add future built-in agents.
There was a problem hiding this comment.
There's now a single RUN command that iterates the md files and packs them into a JSON object using jq, gzips the JSON, and drops it as SOAgenticPrompts.bin in the folder for embedding.
| if isClientError(err) { | ||
| web.Respond(w, r, http.StatusBadRequest, err.Error()) | ||
| } else { | ||
| web.Respond(w, r, http.StatusInternalServerError, "ERROR_UPSTREAM_SERVICE_ERROR") |
There was a problem hiding this comment.
Some of the error flows appear to not log the error. In some situations at least the client can see an error constant indicating a specific issue (INVALID_MODEL, TOO_LARGE, etc). But in other situations such as responding with ERROR_UPSTREAM_SERVICE_ERROR it's not clear that an admin could adequately assist with troubleshooting what that upstream error actually was since I don't see it being logged here or in the logic that calls down into these error helpers.
| defer bodyWriter.Close() | ||
| processor.ensureFirstSend() | ||
| processor.writeText(notice) | ||
| processor.finalizeGemini("end_turn", nil) |
There was a problem hiding this comment.
Why is a "gemini" referenced here, outside of the gemini adapter?
There was a problem hiding this comment.
Fixed. By passing nil in the second parameter, we weren't doing anything gemini specific. Refactored into a generic finalize function that's called by both gemini and openai.
toolBusyMaxRetries and toolBusyRetryDelayMs now get their values from the info endpoint.
No session should ever achieve 10K direct subsessions.
Refactored how the SSE processor finalizes so there's a non-provider specific way to finalize without providing usage.
Simplified embedding by joining the prompts into a single JSON file, gzipping that, and finally embedding the one file instead of individual files. Each markdown file in the agentic folder becomes a field (sans extension) holding it's contents. Uses case sensitive file names. Added logging around prompt parsing.
Description
Related Issues
Checklist
Questions or Comments